SupR: Multithreaded and Distributed R

Internal-Release

Computer Operating Systems

Download and install

Define environment variables

Run SupR to check whether everything is OK

Libraries

Some packages for statistical analysis are to be developed.

Documentation

Examples

Here are some different variants of this example: EM01.R, EM02.R.

Acknowledgements

I am grateful to have colleagues who are visionary in big data analysis in general and statistical computing in R in particular. I thank Professor Bill Cleveland for long time conversations on RHIPE, an interface to Hadoop and R for large and complex data analysis, and opportunities to access his Hadoop clusters.

Professor Michael Zhu brought (Apache) Spark and Scala into my attention, when it was first made publicly available through DataBricks. Long discussions, started before then, with Michael Zhu are particularly inspiring on development of user-friendly distributed computing software for data analysis.

The work was partially supported by the National Science Foundation under Grant No. DMS-1316922.